Modeling presence or absence is a classic problem involving mixture models, specifically random variables that are zero-inflated. Extra zeros are encountered when we model presence or absence because zeros arise from two conditions: truly absent individuals and individuals present but undetected. This means we need a model for the process that controls occupancy, the true state, and model of the data that accounts for detection. This is often our starting point in Bayesian analysis – there a true, unobserved state we seek to understand using a model of a process. We take imperfect observations on that state and must correct them using a model of the data.
A fundamental question in landscape ecology seeks to understand how landscape structure shapes variation in the abundance of species. We will use data from the Swiss Survey of Common Breeding Birds, courtesy of Royle and Dorazio (2008), to model habitat occupancy by a common, resident bird in the Swiss Alps, the willow tit (Parus montanus). The data come from annual surveys of one km2 quadrats distributed across Switzerland (Fig. 1). Surveys are conducted during the breeding season on three separate days, but some quadrats have missing data so that the number of replicate observations is fewer than three.
During each survey, an observer records every visual or acoustic detection of a breeding species and marks its location using a global positioning system or, in earlier years, a paper map. Because we are observing a resident species during the breeding season, we assume that the true state (occupied or unoccupied) does not change among sample dates, an assumption known as closure.
Fig. 1. The willow tit (left, c/o of Francis C. Franklin) is one of 70 bird species that are surveyed annually for abundance in 267 1 km2 sampling units distributed across Switzerland (right, c/o the Swiss Ornithological Institute).
We want to understand the influence of forest cover and elevation on the distribution of the willow tit. The data frame SwissBirds has the number of times a quadrat (quadrat) was searched (numberVisits) and the number of times willow tits were detected (numberDetections). We have covariates on forest canopy cover (forestCover) as well as elevation in meters (elevation) for each quadrat surveyed. Develop a model of the influence of forest cover and elevation on the distribution of willow tits. Your model should allow estimation of the optimum elevation of willow tit habitat at the mean forest cover, where optimum elevation is defined as the elevation where probability of occupancy is maximum.
scale function to scale these covariates (this will drastically speed convergence).{ # Extra bracket needed only for R markdown files
sink("SwissBirds.R")
cat("
model{
# priors
p ~ dbeta(1, 1)
for (i in 1:4){
beta[i] ~ dnorm(0, .368)
}
# likelihood
for (i in 1:N){
z[i] ~ dbern(psi[i])
logit(psi[i]) <- beta[1] + beta[2]*elevation[i] + beta[3]*elevation[i]^2 + beta[4]*forestCover[i]
y[i] ~ dbin(z[i] * p, n[i])
}
# derived quantities
elevationMaxScaled <- -beta[2]/(2 * beta[3])
elevationMax <- elevationMaxScaled * sdElevation + muElevation
for (j in 1:length(elevationPredict)){
logit(psiPredict[j]) <- beta[1] + beta[2]*elevationPredict[j] + beta[3]*elevationPredict[j]^2
}
}
", fill = TRUE)
sink()
} # Extra bracket needed only for R markdown files
library(SESYNCBayes)
library(rjags)
## Loading required package: coda
## Linked to JAGS 4.2.0
## Loaded modules: basemod,bugs
set.seed(1)
elevation <- scale(SwissBirds$elevation)
muElevation <- attr(elevation, "scaled:center")
sdElevation <- attr(elevation, "scaled:scale")
elevationPredict = (seq(500, 2500, 10) - muElevation)/sdElevation
forestCover <- scale(SwissBirds$forestCover)
muforestCover <- attr(forestCover, "scaled:center")
sdforestCover <- attr(forestCover, "scaled:scale")
inits = list(
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)),
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)),
list(z = rep(1, nrow(SwissBirds)), p = runif(1, 0, 1), beta = runif(4, -2, 2)))
data = list(
N = nrow(SwissBirds),
elevation = as.double(elevation),
forestCover = as.double(forestCover),
n = as.double(SwissBirds$numberVisits),
y = as.double(SwissBirds$numberDetections),
muElevation = as.double(muElevation),
sdElevation = as.double(sdElevation),
elevationPredict = as.double(elevationPredict))
n.adapt = 5000
n.update = 10000
n.iter = 10000
jm = jags.model("SwissBirds.R", data = data, inits = inits, n.chains = length(inits), n.adapt = n.adapt)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 237
## Unobserved stochastic nodes: 242
## Total graph size: 4128
##
## Initializing model
update(jm, n.iter = n.update)
zm = coda.samples(jm, variable.names = c("p", "beta", "elevationMax"), n.iter = n.iter, n.thin = 1)
zj = jags.samples(jm, variable.names = c("elevationMax", "psiPredict"), n.iter = n.iter, n.thin = 1)
summary(zm)
##
## Iterations = 15001:25000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] -0.2067 0.27080 0.0015634 0.0040749
## beta[2] 1.9649 0.29719 0.0017159 0.0040711
## beta[3] -1.0894 0.26656 0.0015390 0.0048253
## beta[4] 0.8469 0.22889 0.0013215 0.0028650
## elevationMax 1792.7316 148.20604 0.8556679 2.2806875
## p 0.7865 0.02911 0.0001681 0.0002495
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -0.7350 -0.3893 -0.2042 -0.0268 0.3289
## beta[2] 1.4194 1.7574 1.9532 2.1599 2.5691
## beta[3] -1.6348 -1.2627 -1.0808 -0.9073 -0.5887
## beta[4] 0.4131 0.6903 0.8410 0.9979 1.3098
## elevationMax 1589.6827 1695.2004 1767.4685 1859.5154 2141.1439
## p 0.7266 0.7674 0.7876 0.8070 0.8401
plot(zm)
gelman.diag(zm)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## beta[1] 1 1
## beta[2] 1 1
## beta[3] 1 1
## beta[4] 1 1
## elevationMax 1 1
## p 1 1
##
## Multivariate psrf
##
## 1
heidel.diag(zm)
## [[1]]
##
## Stationarity start p-value
## test iteration
## beta[1] passed 1 0.2501
## beta[2] passed 1 0.5592
## beta[3] passed 1 0.7667
## beta[4] passed 1 0.0573
## elevationMax passed 1 0.2824
## p passed 1 0.6063
##
## Halfwidth Mean Halfwidth
## test
## beta[1] passed -0.204 0.013722
## beta[2] passed 1.962 0.013390
## beta[3] passed -1.088 0.016077
## beta[4] passed 0.846 0.009912
## elevationMax passed 1791.077 7.159923
## p passed 0.787 0.000828
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## beta[1] passed 1 0.518
## beta[2] passed 1 0.527
## beta[3] passed 1 0.564
## beta[4] passed 1 0.981
## elevationMax passed 1 0.879
## p passed 1 0.201
##
## Halfwidth Mean Halfwidth
## test
## beta[1] passed -0.211 0.013800
## beta[2] passed 1.965 0.013416
## beta[3] passed -1.088 0.016776
## beta[4] passed 0.846 0.009438
## elevationMax passed 1794.770 8.062228
## p passed 0.786 0.000878
##
## [[3]]
##
## Stationarity start p-value
## test iteration
## beta[1] passed 1 0.349
## beta[2] passed 1 0.375
## beta[3] passed 1 0.235
## beta[4] passed 1 0.471
## elevationMax passed 1 0.245
## p passed 1 0.481
##
## Halfwidth Mean Halfwidth
## test
## beta[1] passed -0.205 0.013977
## beta[2] passed 1.968 0.014621
## beta[3] passed -1.092 0.016282
## beta[4] passed 0.848 0.009822
## elevationMax passed 1792.347 7.973452
## p passed 0.786 0.000835
par(mfrow = c(1, 2))
psi <- summary(zj$psiPredict, quantile, c(.025, .5, .975))$stat
plot(seq(500, 2500, 10), psi[2,], type = "l", ylim = c(0,1), xlab = "Elevation (m)", lwd = 2,
ylab ="Probability of occupancy")
lines(seq(500, 2500, 10), psi[1,], type = "l", lty = "dashed", lwd = 2)
lines(seq(500, 2500, 10), psi[3,], type = "l", lty = "dashed", lwd = 2)
abline( v = mean(zj$elevationMax), lwd = 2)
text(1300, .9, "Optimum elevation", cex = .8)
hist(zj$elevationMax, breaks = 100, main = "", xlab = "Elevation (m)", xlim = c(1500, 2500),
freq = FALSE, col = "gray")
abline(v = summary(zj$elevationMax, quantile, c(.975))$stat, lty = "dashed", lwd = 2)
abline(v = summary(zj$elevationMax, quantile, c(.025))$stat, lty = "dashed", lwd = 2)
Fig.2. Probability of occupancy at the mean forest cover as a function of elevation (left panel) and the posterior density of the optimum elevation at the mean forest cover (right panel). Dashed give are .025 and .975 quantiles, also known as .95 equal-tailed Bayesian credible intervals.
Royle, J.A., and R.M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations, and Communities. Academic Press, London, United Kingdom.